And-Or Parallelism on Shared-Memory Multiprocessors

نویسندگان

  • Gopal Gupta
  • Bharat Jayaraman
چکیده

D This paper presents an extended and-or tree and an extended WAM (Warren Abstract Machine) for efficiently supporting both and-parallel and or-parallel execution of logic programs on shared-memory multiprocessors. Our approach for exploiting both andand or-parallelism is based on the binding-arrays method for or-parallelism and the RAP (Restricted And-Parallelism) method for and-parallelism, two successful methods for implementing or-parallelism and and-parallelism, respectively. Our combined and-or model avoids redundant computations when goals exhibit both andand or-parallelism, by representing the cross product of the solutions from the and-or parallel goals rather than recomputing them. We extend the classical and-or tree with two new nodes: a “sequential” node (for RAPS sequential goals), and a “cross-product” node (for the cross product of solutions from and-or parallel goals). The paper also presents an extension of the WAM, called AO-WAM, which is used to compile logic programs for and-or parallel execution based on the extended and-or tree. The AO-WAM incorporates a number of novel features: (i) inclusion of a base array with each processor’s binding array for constant-time access to variables in the presence of and-parallelism, (ii) inclusion of new stack frames and instructions to express solution sharing, and (iii) novel optimizations which minimize the cost of binding-array updates in the presence of and-parallelism. a

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...

متن کامل

Executing Nested Parallel Loops on Shared-Memory Multiprocessors

Cache-coherent, bus-based shared-memory multiprocessors are a cost-e ective platform for parallel processing. In scienti c parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...

متن کامل

Effective Instruction Prefetching In Chip Multiprocessors

threaded application performance, often achieved through instruction level parallelism per chip is increasing, the software and hardware techniques to exploit the potential of studies mostly involve distributed shared memory multiprocessors and fetching will not be fully effective at masking the remote fetch latency. the effective address of the load instructions along that path based upon a hi...

متن کامل

Shared memory multiprocessors

The hardware evolution has reached the point where it becomes extremely difficult to further improve the performance of superscalar processors by either exploiting more instruction-level parallelism (ILP) or using new semiconductor technologies. The effort to increase processor performance by exploiting ILP follows the law of diminishing returns: new, more complex optimisations tend to cost mor...

متن کامل

EXECUTING NESTED PARALLEL LOOPS ON SHARED - MEMORYMULTIPROCESSORSSadun

Cache-coherent, bus-based shared-memory multiprocessors are a cost-eeective platform for parallel processing. In scientiic parallel applications, most of the computation involves processing of large multidimensional data structures which results in a high degree of data parallelism. This parallelism can be exploited in the form of nested parallel loops. Most existing shared memory multiprocesso...

متن کامل

RSIM An Execution Driven Simulator for ILP Based Shared Memory Multiprocessors and Uniprocessors

This paper describes RSIM the Rice Simulator for ILP Multiprocessors Version RSIM sim ulates shared memory multiprocessors and unipro cessors built from processors that aggressively ex ploit instruction level parallelism ILP RSIM is execution driven and models state of the art ILP pro cessors an aggressive memory system and a multi processor coherence protocol and interconnect includ ing conten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Log. Program.

دوره 17  شماره 

صفحات  -

تاریخ انتشار 1993